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Abstract. Based on the principles of importance sampling and resam- 
pling, sequential Monte Carlo (SMC) encompasses a large set of powerful 
techniques dealing with complex stochastic dynamic systems. Many of 
these systems possess strong memory, with which future information can 
help sharpen the inference about the current state. By providing theo- 
retical justification of several existing algorithms and introducing several 
new ones, we study systematically how to construct efficient SMC algo- 
rithms to take advantage of the "future" information without creating a 
substantially high computational burden. The main idea is to allow for 
lookahead in the Monte Carlo process so that future information can be 
utilized in weighting and generating Monte Carlo samples, or resampling 
from samples of the current state. 

Key words and phrases: Sequential Monte Carlo, lookahead weighting, 
lookahead sampling, pilot lookahead, multilevel, adaptive lookahead. 



1. INTRODUCTION 

Sequential Monte Carlo (SMC) methods have been 
widely used to deal with stochastic dynamic systems 
often encountered in engineering, bioinformatics, fi- 
nance and many other fields (Gordon, Salmond and 
Smith, 1993; Kong, Liu and Wong, 1994; Avitzour, 
1995; Hiirzeler and Kiinsch, 1995; Liu and Chen, 
1995; Kitagawa, 1996; Kim, Shephard and Chib, 
1998; Liu and Chen, 1998; Pitt and Shephard, 1999; 
Chen, Wang and Liu, 2000; Doucet, de Freitas and 
Gordon, 2001; Liu, 2001; Fong et al., 2002; Godsih, 
Doucet and West, 2004). They utilize the sequen- 
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tial nature of stochastic dynamic systems to gen- 
erate sequentially weighted Monte Carlo samples of 
the unobservable state variables or other latent vari- 
ables, and use these weighted samples for statistical 
inference of the system or finding the stochastic op- 
timization solution. A general framework for SMC 
is provided in Liu and Chen (1998) and Del Moral 
(2004). Many successful applications of SMC in di- 
verse areas of science and engineering can be found 
in Doucet, de Freitas and Gordon (2001) and Liu 
(2001). 

Dynamic systems often possess strong memory so 
that future information is often critical for sharpen- 
ing the inference about the current state. For exam- 
ple, in target tracking systems (Godsill and Vermaak, 
2004; Ikoma et al., 2001), at each time point along 
the trajectory of a moving object, one observes a 
function of the object's location with noise. Such ob- 
servations obtained in the future contain substantial 
information about the current true location, velocity 
and acceleration of the object. In protein structure 
prediction problems, often the objective is to find an 
optimal polymer conformation that minimizes cer- 
tain energy function. By "growing" the polymer se- 
quentially (Rosenbluth and Rosenbluth, 1955), the 
construction of polymer conformations can be turned 
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into a stochastic dynamic system with long mem- 
ory. In such cases, lookahead techniques have been 
proven very useful (Zhang and Liu, 2002). 

To utilize the strong memory effect, Clapp and 
Godsill (1999) studied fixed-lag smoothing using the 
information from the future. Independently, Chen, 
Wang and Liu (2000) proposed the delayed-sample 
method that generates samples of the current state 
by integrating (marginalizing) out the future states, 
and showed that this method is effective in solving 
the problem of adaptive detection and decoding in 
a wireless communication problem. The computa- 
tional complexity of this method, however, can be 
substantial when the number of future states be- 
ing marginalized out is large. Wang, Chen and Guo 
(2002) developed the delayed-pilot sampling method 
which generates random pilot streams to partially 
explore the space of future states, as well as the 
hybrid-pilot method that combines the delayed-sam- 
ple method and the delayed-pilot sampling method. 
Guo, Wang and Chen (2004) proposed a multilevel 
method to reduce complexity for the large state space 
system. These low-complexity techniques have been 
shown to be effective in the flat-fading channel prob- 
lem treated in Chen, Wang and Liu (2000). Doucet, 
Briers and Senecal (2006) proposed a block sam- 
pling strategy to utilize future information in gen- 
erating better samples of the current states. Zhang 
and Liu (2002) developed the pilot- exploration re- 
sampling method, which utilizes multiple random 
pilot paths for each particle of the current state to 
gather future information, and showed that it is ef- 
fective in finding the minimum-energy polymer con- 
formation. 

In this paper we formalize the general principle 
of lookahead in SMC. Several existing methods are 
then systematically summarized and studied under 
this principle, with more detailed theoretical justifi- 
cations. In addition, we propose an adaptive looka- 
head scheme. The rest of this paper is organized as 
follows. In Section 2 we briefly overview the gen- 
eral framework of SMC. Section 3 introduces the 
general principle of lookahead. Section 4 discusses 
several lookahead methods in detail. In Section 5 
we discuss adaptive lookahead. Section 6 presents 
several applications. The proofs of all theorems are 
presented in the Appendix. 

2. SEQUENTIAL MONTE CARLO (SMC) 

Following Liu and Chen (1998), we define a sto- 
chastic dynamic system as a sequence of evolving 



probability distributions ttq (xq ), tti (xi ),..., ttj (x^ ) , 
. . . , where Xf is called the state variable. We focus 
on the case when the state variable evolves with in- 
creasing dimension, that is, xj = {xo,xi, . . . ,xt) = 
{xt-i,xt), where xt can be multidimensional. For 
example, in the state space model, the latent state 
Xt evolves through state dynamic xt ^ gt{- \ Xf_i), 
and "information" yt ^ ft{- \^t) is observed at each 
time t. In this case, 

M^t) =p{^t I yt) 
t 

o^go{xo)Y[9s{xs |Xs„i)/s(ys |Xs). 
s=l 

In this paper we use the notation 7rt{xt \ x^^i) = 
p{xt I xt-i,yt) and iTt-iixt \ xt-i) = p{xt \ xt_i, 
y(_i). Usually, the goal is to make inference of a cer- 
tain function h{xt) given all past information = 

{yi,---,yt)- 

With all the information up to time t, we see that 
the minimum mean squared error (MMSE) estima- 
tor of h{^t), which minimizes E^^ [h — /i(xt)]^, is h = 
ET^^{h{xt)). When an analytic solution ET^^{h{xt)) 
is not available, an importance sampling Monte Carlo 

scheme can be employed (Marshall, 1956; Liu, 2001). 

(i) 

Specifically, we can draw samples xj , j = 1, . . . ,m, 
from a trial distribution rt{xt), given that r((x()'s 
support covers 7r((x()'s support, then E.,^^{h{xt)) can 
be estimated by 

l5:^?\(xp-)) or -^E-F^^(xF), 
j=i 2^j=i Wt j=i 

where w[''^ = wt{xf^) = vrt (xp^)/rt(xp^) is referred 

to as a proper importance weight for x^ with re- 
spect to 7rj(xf). Although the second estimator is 
biased, it is often less variable and easier to use since 
in this case wt only needs to be evaluated up to a 
multiplicative constant. Throughout this paper, we 
will use Xt and Xj to denote the true state and the 
Monte Carlo sample, respectively. 

The basis of all SMC methods is the so-called 
"sequential importance sampling (SIS)" (Kong, Liu 
and Wong, 1994; Liu, 2001), which sequentially builds 
up a high-dimensional sample according to the chain 
rule. More precisely, the sample x^ is built up se- 
quentially according to a series of low-dimensional 
conditional distributions: 

rt{xt) = qo{xo)qiixi \ xo)q2{x2 \ xi) • --qtixt \ xt-i). 
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The importance weight for the sample can be up- 
dated sequentially as 



0•)^ 



where 



X 



(i) ^ 



is called the incremental weight. The choice of the 
trial distribution rt (or qt) has a significant impact 
on the accuracy and efficiency of the algorithm. As a 
general principle, a good trial distribution should be 
close to the target distribution. An obvious choice 
of qt in the dynamic system setting is qt{xt j Xf_i) = 
TTt-ii.xt I xj_i) (Avitzour, 1995; Gordon, Salmond 
and Smith, 1993; Kitagawa, 1996). Kong, Liu and 
Wong (1994) and Liu and Chen (1998) argued that 
qt{xt I X(_i) = TTtixt I xt_i) is a better trial distribu- 
tion because of its usage of the most "up-to-date" 
information to generate xt. More choices of qt{xt \ 
xj_i) can be found in Chen and Liu (2000), Kotecha 
and Djuric (2003), Lin et al. (2005), Liu and Chen 
(1998), van der Merwe et al. (2002) and Pitt and 
Shephard (1999). 

As t increases, the distribution of the importance 
weight wt often becomes increasingly skewed (Kong, 
Liu and Wong, 1994), resulting in many unrepre- 
sentative samples of x^. A resampling scheme is of- 
ten used to alleviate this problem (Gordon, Salmond 
and Smith, 1993; Liu and Chen, 1995; Kitagawa, 
1996; Liu and Chen, 1998; Pitt and Shephard, 1999; 
Chopin, 2004; Del Moral, 2004). The basic idea is 
to imagine implementing multiple SIS procedures in 
parallel, that is, to generate {x|"^\ . . . ,Xj™^} at each 

step t, with corresponding weig 
and resample from the set according to a certain 
"priority score." More precisely, suppose we have 
obtained {(xp\u;p^),j = 1, . . . ,m} that is properly 
weighted with respect to 7rt(xj), then we create a 
new set of weighted samples as follows: 
Resampling scheme. 



(i) 

• For each sample xj , j = 1, 

ity score al > 0. 

• For j = 1, . . . ,m, 

— Randomly draw xj'^"'^ from the set {xj'",j 

1, . . . , m} with probabilities proportional to {ap^ 
j = l,...,m}; 



, m, assign a prior- 



Si) 



If X 



.Oo) 



then set the new weight associ- 



ated with X. to be W- 



<3) 



■ W 



Return the new set of weighted samples {(x. 



*(i) 



w 



),j = l,...,m}. 



This new set of weighted samples is also approx- 
imately properly weighted with respect to 7rj(xj). 

Often, al are chosen to be proportional to w). , 
so that the new samples are equally weighted. Some 
improved resampling schemes can be found in Liu 
and Chen (1998), Carpenter, Clifford and Fearnhead 
(1999), Crisan and Lyons (2002), Liang, Chen and 
Zhang (2002) and Pitt (2002). 

Resampling plays an important role in SMC. Cho- 
pin (2004) and Del Moral (2004) provide asymp- 
totic results on its effect, but its finite sample ef- 
fects have not been fully understood. Performing 
resampling at every step t is usually neither neces- 
sary nor efficient since it induces excessive variations 
(Liu and Chen, 1995). Liu and Chen (1998) suggests 
to use either a deterministic schedule, in which re- 
sampling only takes place at time T, 2T, 3T, . . . , or a 
dynamic schedule, in which resampling is performed 
when the effective sample size (Kong, Liu and Wong, 
1994) ESS = m/(l -|- vti^w)) is less than a certain 
threshold, where vt{w) is the estimated coefficient 
of variation, that is. 



:i) 



vt{w) 



In problems that the state variable xt takes values 
in a finite set A = {ai, . . . , duplicated samples 

produced in sampling or resampling steps result in 
repeated calculation and a waste of resources. Using 
an idea related to the rejection control (Liu, Chen 
and Wong, 1998), Fearnhead and Clifford (2003) de- 
veloped a more efficient scheme that combines sam- 
pling and resampling in one step and guarantees to 
generate distinctive samples. 

Most of the SMC algorithms are designed for fil- 
tering and smoothing problems. It is a challenging 
problem when the system has unknown fixed param- 
eters to be estimated and learned. Some new devel- 
opment can be found in Gilks and Berzuini (2001), 
Chopin (2002), Fearnhead (2002), Andrieu, Doucet 
and Holenstein (2010) and Carvalho et al. (2010). In 
this paper we assume all the parameters are known. 

3. THE PRINCIPLE OF LOOKAHEAD 

To formalize our argument that the "future" infor- 
mation is helpful for the inference about the current 
state, we assume that the dynamic system irt offers 
more and more "information" of the state variables 
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as t increases. A simple way to quantify this con- 
cept is to assume that the information available at 
time t takes the form yt = {yi,y2, ■ ■ ■ ,yt) and incre- 
ments to {yt,yt+i) at time 1. The dynamic sys- 
tem '7rt{xt) simply takes the form of 7rt(xt) = p{xt \ 
yt). Although this framework is not all-inclusive, 
it is sufficiently broad and our theoretical results 
are all under this setting. The basic lookahead prin- 
ciple is to use "future" information for the infer- 
ence of the current state. That is, we believe that 
E{h{xt) I yt+A) results in a better inference of the 
current state h{xt) than E{h{xt) \ yt) for any A > 0. 
Thus, if the added computational burden is not con- 
sidered, we would like to use a Monte Carlo estimate 
of E(h{'x.t) I yt+A) to make inference on hixt)- 

Here we study the benefit of the lookahead strat- 
egy rigorously. Let ht+A be a consistent Monte Carlo 
estimator of E.„^^^{h{xt)) = E{h{xt) \ yt+A) and 
ht+A is independent of the true state conditional 
on yt+A - The mean squared difference between h{xt) 
and its estimator ht+A, averaged over the Monte 
Carlo samples, the true state and the future obser- 
vations can be decomposed as 

E^^[ht+A -h{xt)f 

= E^^[ht+A - E{h{-Kt) I yt+A)]^ 

(2) 

+ E^,[E{h{^t)\yt+A)-h{^t)f 

^/(A) + //(A). 

As the Monte Carlo sample size tends to infinity, 
/(A), which is the variance of the consistent esti- 
mator, tends to zero. For 11(A), we can show the 
following: 

Proposition 1. For any square integrable func- 
tion h{-), //(A) decreases as A increases. 

The proof is given in the Appendix. 

When the Monte Carlo sample size is sufficiently 
large, /(A) becomes negligible relative to 11(A). 
Hence, the above proposition implies that a consis- 
tent Monte Carlo estimator of E(h(xt) \ yt+A) is al- 
ways more accurate with larger A when the Monte 
Carlo sample size is sufficiently large. 

However, this gain of accuracy is not always de- 
sirable in practice because of the additional compu- 
tational costs. Most of the time additional compu- 
tational resources are needed to obtain consistent 
estimators of E(h(xt) \ yt+A) with larger A. Fur- 
thermore, 1(A) sometimes increases sharply as A 
increases when the Monte Carlo sample size is fixed. 
More detailed analysis is shown in Section 5. 



In order to achieve the goal of estimating the looka- 
head expectation ETrt^^(h('x.t)) effectively using SMC, 
we may consider defining a new stochastic dynamic 
system with the probability distribution at step t be- 
ing the A-step lookahead (or delayed) distribution, 
that is, 

7r;(xt) =7rt+A(xt) 

(3) 

= j TTt+A(xt+A)d.xt+i- ■ -dxt+A- 

With the system defined by {vtq (xq ), 7rj|' (xi ),...} , the 
same SMC recursion can be carried out. 

In practice, however, it is often difficult to use this 
modified system directly since the analytic evalua- 
tion of the integration/summation in (3) is impos- 
sible for most systems. Even when the state vari- 
ables take values from a finite set so that Trt+A(^t) 
can be calculated exactly through summation, the 
number of terms in the summation grows exponen- 
tially with A. Nonetheless, the lookahead system 
{7rQ(xo), 7r|(xi), . . .} suggests a potential direction 
that we can work toward. 

There are three possible ways to make use of the 
future information: (i) for choosing a good trial dis- 
tribution rt(xt) close to 7r^(xt); (ii) for calculating 
and keeping track of the importance weight for xt 
using 7r^(xt) as the target distribution; and (iii) for 
setting up an effective resampling priority score func- 
tion at(xt) using information provided by 7rJ'(x(). 
Detailed algorithms are given in the next section. 

We note here that lookahead (into the "future") 
strategies are mathematically equivalent to delay 
strategies (i.e., making inference after seeing more 
data) in Chen, Wang and Liu (2000). In our setup, 
we assume that the current time is t -|- A and we 
observe yi, . . . , yt+A - In fact, some of the algorithms 
we covered were initially named "delay algorithms," 
under the notion that the system allows certain de- 
lay in estimation. The reason that we choose to use 
the term "lookahead" instead of "delay" is that we 
focus on sampling of xt , using information after time 
t (i.e., its own future). It is easier to discuss and com- 
pare the same xt when looking further into the fu- 
ture (increasing A), rather than a longer delay (with 
a fixed current time and to discuss the estimation of 
Xt-A with changing A). 

The lookahead algorithms we discuss here are clos- 
ely related to the smoothing problem in state space 
models where one is interested in making inference 
with respect to p(xt \ yi, . . . , yr) for t = 1, . . . ,T. 
Many algorithms, some are closely related to our 
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approach, can be found in Godsill, Doucet and West 
(2004), Douc et al. (2009), Briers, Doucet and Maskell 
(2010), Carvalho et al. (2010), Fearnhead, Wyncoll 
and Tawn (2010) and others. However, in this paper 
we emphasize on dynamicahy processing of p{xt \ 
7/1, ... , yt+A) for t = 1, . . . ,n. It has the characteris- 
tic of both filtering (updating as new information 
comes in) and smoothing (inference with future in- 
formation) . 

Another possible benefit of the proposed looka- 
head strategy is that it tends to be more robust 
to outliers, since the future information will correct 
the misinformation from the outliers. This is par- 
ticularly helpful during resampling stages. With an 
outlier, the "good samples" that are close to the 
true state will be mistakenly given smaller weights. 
Resampling according to weights will then be more 
likely to remove these "good samples." Lookahead 
that takes into account more information will be 
very useful in such a situation. 

A "true" lookahead would utilize the expected 
(but unobserved) future information in generating 
samples of current xj. The popular and powerful 
auxiliary particle filter (Pitt and Shephard, 1999) is 
based on such an insight, though it only looks ahead 
one step. Our experience shows that the improve- 
ment is limited with more steps of such a "true" 
lookahead scheme, as the information is limited to 
yi,. . . ,yt. Here we focus on the utilization of the 
extra information provided by future observations. 

4. BASIC LOOKAHEAD STRATEGIES 
4.1 Lookahead Weighting Algorithm 

Suppose at step t + A, we obtain a set of weighted 
samples {{^t+A''^t+A)^3 = l,-..,m} properly 
weighted with respect to TTt+Ai^t+A), using the stan- 
dard concurrent SMC. With the same weight = 

^i+A' partial chain is also properly weighted 
with respect to the marginal distribution 7rt+A(xf). 
Specifically, we have the following algorithmic steps. 



Lookahead weighting algorithm. 

At time t = 0, for j = 1,. . . , m: 

- Draw {x^^\ . . . ,x^^) from distribution qo{xo) • 

Uf=lQs{Xs\^s-l)- 



Set 



Wn OC 



vrA(x?: 



At times t = 1,2, . . . , suppose we obtained {{^t+A-i^ 
w^\),j = 1, . . . , TTi} properly weighted with respect 
to vrt+A-i(xt+A-i)- 

- ( Optional.) Resample with probability propor- 

(i) (i) 

tional to the priority scores al_i = wl_-^ to ob- 
tain a new set of weighted samples. 

- Propagation: For j = 1, . . . ,m: 

* (Sampling.) Draw x^'')/^ from distribution 



qt+A{xt+A I A^A-i)- Set xj^'^ = (xj:^^^_^ 



(i) 



X 



(j) 

t+A>- 



* (Updating weights.) Set 



U) (j) 
wl DC wi_^ 



vrt+A(x2^) 



■Kt+A-lK^t+A-Vlt+^y^t+A I ^t+A-l) 
Inference: ET^^^^{h{xt)) is estimated by 

m m 



„(i) 



9o(xo)nf=i9.(xi^'^|x2i 



Because the x^ are still generated based on the 
information up to step t, for example, qt{xt \ Xf_i) = 
iTt{xt I X(_i), and the future information is utilized 
only through weight adjustments; Chen, Wang and 
Liu (2000) called this method the delayed-weight 
method. Clapp and Godsill (1999) called the pro- 
cedure sequential imputation with decision step, as 
inference and decisions are made separately at dif- 
ferent time steps. 

The lookahead weighting algorithm is a simple 
scheme to provide a consistent estimator for 
£'-n-j_i^^(/i(xj)) with almost no additional computa- 
tional cost, except for some additional memory buffer. 
Hence, it is often useful in real-time filtering prob- 
lems (Chen, Wang and Liu, 2000; Kantas et al., 2009). 
However, when A is large, it is well known that 
such a forward algorithm is highly inaccurate and 
inefficient in approximating the smoothing distribu- 
tion TTt+Aixt) (e.g., Godsill, Doucet and West, 2004; 
Douc et al., 2009; Briers, Doucet and Maskell, 2010; 
Fearnhead, Wyncoll and Tawn, 2010; Carvalho et al., 
2010). 

4.2 Exact Lookahead Sampling 

This method was proposed by Chen, Wang and 
Liu (2000), termed as delayed- sample method. Its 
key is to use the modified stochastic dynamic sys- 
tem defined by vr^*(xt) = 7rt_|_A(x() in (3) to construct 
the importance sampling distribution. At step t, the 
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Fig. 1. Illustration of the exact lookahead sampling method, in which the trail distribution qt{xt = i \ i = 0,1, is 

proportional to the summation of TTt+2{xt = i,Xt+i,Xt+2 \ ^t'-i) f"^ Xt+i,Xt+2 ~ 0, 1. 



conditional sampling distribution for is chosen 
to be 

(4) qt{xt I xSi\) = I xli\) = ^<+A(xt I xji\), 

and the weight is updated accordingly as 



U) I ^(j) 



7rj+A_i(xii\) 

Figure 1 illustrates the method with xt A = {0, 1} 
and A = 2, in which the trial distribution is 



qt{xt = i I x^_;^^ 

= 7rt+2(xt = i |x|i\) 

= X] X] '^t+2{xt = i,Xt+i,Xt+2 I x|i\) 



Xt+l Xt+2 



oc ^ 7rt+2(x|i\,xt = +1,3:^4.2) 



for i = 0,1. 

The exact lookahead sampling algorithm is shown 
as follows. 



Exact lookahead sampling algorithm. 
At time t = 0, for j = 1,. 



, m: 



Draw Xq^ from distribution qo{xQ) 



Set wi^^=7TA{x'^>)/qo{x\^>) 



At times t = 1,2, . . . : 



( Optional.) Resample {'}S^}i,w^\,j = 1, . . . , m} 
(7 

with priority scores oq^i 
Propagation: For j = 1, . . 



■ w 



t~i- 



, m: 



* (Sampling.) Draw x'f'^ from distribution 
qt{xt I xji\) = -Kt+Aixt I x|i\) 
vrt+A(x|i\,xt) 



vrt+A(xJi\ 



* (Updating weights.) Set 



w 



w 



(j) 7rt+A(xJi\] 



"^t+A~l(xJi\) 
Inference: E-,^^_^^{h[ii.t)) "is estimated by 



iJ) 



i=i 



Specifically, for models with finite state space, the 
sampling and weight update steps in the exact looka- 
head sampling method involve evaluation of summa- 
tions of the form 

vrt+A(xt)= E '^t+Ai^t,Xt+i,. ■ ■ ,xt+A) 

Xt+l,...,Xt+A 

t+A 

(5) oc ^ go{xo)Y[9s{xs\^s-i) 



Xt+l,...,Xt+A 



s=l 



fsiVs I Xs). 
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For continuous state space, it is more difficult to 
adopt tliis approacli, as one needs to generate sam- 
ples from 



■~1Tt+A{xt I x[i\ 



(6) 



Xi+a) dxt+i ■ ■ ■ dxt+A 

and evaluate it in order to update the weight. A slight- 
ly different version of the algorithm was proposed in 
Clapp and Godsill (1999), termed as lagged time fil- 
tering density. Instead of calculating the exact sam- 
pling density (5) or (6), and sample from it, they 
proposed to use forward filtering backward sampling 
techniques of Carter and Kohn (1994) and Clapp 
and GodsiU (1997). 

As demonstrated in Chen, Wang and Liu (2000) 
and Clapp and Godsill (1999), the exact lookahead 
sampling method can achieve a significant improve- 
ment in performance compared to the concurrent 
SMC method. Chen, Wang and Liu (2000) provided 
some heuristic justification of this method. Here we 
provide a theoretical justification by showing that 
the exact lookahead sampling method generates more 
effective samples (or "particles") than any trial dis- 
tribution that does not utilize the future informa- 
tion. 



w. 



To set up the analysis, we assume that {(xj_^, 
j = 1, . . . , m} is properly weighted with re- 



spect to 7rt_i(xt_i) (not the lookahead distribution). 
We compare two sampling schemes. In exact looka- 
head sampling, x[^'-^^ is generated from TTt+A{xt \ 



Xj{_^-^ ) , and X 



(x 



(i) 



is properly weighted 



with respect to 7rt+A(x() by weight 



(7) 



W- 



(ij) 



w 



t-i 



vr.+A(xa) 
vrt-i(xa) 



(2 7) 

Let sample be generated from a trial distri- 

bution qt{xt I ^t-i) that uses no future information, 
that is, qt{xt \ does not depend on yt+i,..., 



yt+A, then x 



(2,i) 



(xj'^^, xj^''^^) is properly weighted 



with respect to 7rt+A(x() using the weight 



(8) 



w 



(2,i) 



: W 



(j) 

i-1 



vr,_i(xa)g,(xrMx 



(2,i) I (j) 



Let the subscript tti+a indicate that the corre- 
sponding operations are to be taken conditional on 
yt+A, and let 

= J h{xl^\,xt)Trt+Aixt I x|i\) dxt. 

We have the following proposition: 
Proposition 2. 



(9) 
and 

(10) 
(11) 



var. 



(w 



(2,i)^ 



> var. 



(w 



,(1 j) ; 



var^, , , Iwi"^'^^ ET,^^^{h{-Kt) \ xt_i = x|i\ 



i=xa) 



> var^,^^[u;J^'-'^S^^^^(/i(xt) | x^^i = x^i\) 



.(i) 



The proof is presented in the Appendix. 
Note that the right-hand sides of (10) and (11) 
use the Rao-Blackwellization estimator 



,(/i(xO|xi_i=xSi\). 
For finite state space, it is often achievable since 
^^i+A(^(xt) |xt-i =xji\) 



\A\ 

i=l 



h{-x.^^\,xt = aij-Kt+Aixt = ai \ xji\ 



(i) 



where 1^1+ A {xt = «i I '^^^t-i ) have been computed dur- 
ing the propagation step. Also note that (10) does 

not provide a direct comparison between YllLi 



(iJ) 



/i(x?)) and E 



m 



^^'^^/i(xp^). This is because the 
sampling efficiency is also related to function h{-). 
If h{xt) does not depend on xt, then (10) indeed 
shows that the full lookahead sampler is always bet- 
ter. Otherwise, this proposition suggests to use 



1 



m 



<^E,,^^(/l(Xi)|Xi_i=xSi\ 



Em 
.7 = 1 



W- 



t j=i 



for estimation in the exact lookahead sampler. 

As a direct consequence of Proposition 2, the fol- 
lowing proposition shows that exact lookahead sam- 
pling is more efficient than lookahead weighting. Sup- 



t-iJ 



pose in lookahead weighting sample x 



(3j) 



X 



(2,i) 
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{^x.[•'2-^^,xl^'^'^) is available at time t and xJ^'J:'^^^ 
generated from 



(3,j) 



IS 



t+A 

n 

s=i+l 



qs[Xs 



A3,j) 



Xt+l:s-lJ 



in the next A steps. Let x^j^^ = (x. 



(3,j) 



then the weight corresponding to the lookahead 
weighting algorithm is 



w 



(3j) 



: W 



U) 
t-1' 



^t+A(xS+2] 



We have the following proposition: 
Proposition 3. 



(3j) I (3,j).- 



^s-l 



•^'•^■^)>var.,^Ju;f'^-)) 
and for any square integrable function h{xt), 



var^t+Al^t 



{3,j),. (j) J2,j). 



1 -^t 



>var^t+AK''^^W_i,xJ 



The proof is presented in the Appendix. 

In the exact lookahead sampling, the incremental 
weight C/t = 7rf+A(xt_i)/7rj+A-i(xf-i) usually will 
be close to 1 when A is large, so the variance of 
weights typically decreases as A increases (Doucet, 
Briers and Senecal, 2006). The benefit of exact looka- 
head sampling, however, comes at the cost of in- 
creased analytical and computational complexities 
due to the need of marginalizing out the future states 
xt+i, . . . , xt+A in (3). Often, the computational cost 
grows exponentially as the lookahead step A in- 
creases. 

4.3 Block Sampling 

Doucet, Briers and Senecal (2006) proposes a block 
sampling strategy, which can be viewed as a varia- 
tion of lookahead. A slightly modified version (under 
our notation) is given as follows. 

Block sampling algorithm. 

• At time t = 0, for j = 1,. . . , m: 

- Draw {x\^\ . . . ,x'^^) from distribution qo{xo) ■ 



Us=l9s{Xs I X^_i). 

Set 



vrA(x?) 



90(^0) nil ^.(^i'Mxi!^! 



At times t = 1,2, . . . : 

- ( Optional.) Resample {y^^^^^_l,w^\, j = 1, . . . . 

m} with priority scores aj'i^ = w^t\ • 

- Propagation: For j = 1, . . . ,m: 

* (Sampling.) Draw ^^^^^^ from qti'^t^t^A I 



^t+A-l/- 
* (Updating weights.) Set 

— ^t-l^t+A[.-X-t-l^^t:t+A) 

. xj^U) I „(i) „*(i) \ 
/(vr,+A-i(xa,x(J^^_,) 

- Inference: £^7rj^^(^(xt)) is estimated by 

m m 

^4^\(x?ve-?^- 

Here \ x|i\, x*^-'^^) is called the arti- 

ficial conditional distribution. 

Doucet, Briers and Senecal (2006) suggested that 

one should choose q't(x*.^/^^ | ^^l^^i) = Qti^utl^ I 

'x.^ili), that is, the trial distribution does not depend 

(7) 

on y^Xt+A-v Then the optimal choices of qt and \t 
are 

T^t+A\^t:t+A I ^t-i) 



H:t+A I '^t-l^'^t:t+A~li 



and 

^HXf.t+A-i I ^t-l^-^t-.t+A) — ^t+^-^\-^t:t+A~l I ^t-lJ- 
Note that, in this case, the marginal trial distribu- 
tion of x^^"*^ is 7rf+A(xj^"''' I ^[''li), and the weight is 
updated by 



W- 



■ W- 



(j) 7rt+A(xii\) 
7rj+A_i(xJi\) 



In this case, the blocking sampling method becomes 
the exact lookahead sampling. 
In practice, we can use 

■ T^t+A[^t:t+A I ^t-l) 



9t(x: 

and 



■t:t+A \^t-l^^t:t+A-l/ 



\ / (i) I (i) *(i) N 

- ^t+^-n^t:t+A,t-l I ^t-lJ 
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which are low complexity approximations of the op- 
timal qt and A^. 

4.4 Pilot Lookahead Sampling 

Because of the desire to explore the space of future 
states with controllable computational cost, Wang, 
Chen and Guo (2002) and Zhang and Liu (2002) 
considered the pilot exploration method, in which 
the space of future states {xj+i, . . . , xj+a} is par- 
tially explored by pilot "paths." The method could 
be viewed as a low-accuracy Monte Carlo approxi- 
mation to the exact lookahead sampling method. 

The method was introduced for the case of fi- 
nite state space oi xt ^ A = {oi, . . . in both 
Wang, Chen and Guo (2002) and Zhang and Liu 
(2002). Specifically, suppose at time t — 1 we have 

a set of samples {{^\^}i,w[^\),j = l,...,m} prop- 
erly weighted with respect to 7rj_i(xf_i). For each 

and each possible value Oj of xt, a pilot path 



X 



{hi) 
t:t+A 



sequentially from distribution 



'^I+a) constructed 



t+A 



(12) n 



pilot 



X 



(i) 



i,xt — ai,xt+i;s_i). 



s=t+l 



Then, xl can be drawn from a trial distribution 
that utilizes the "future information" gathered by 
the pilot samples ^I'^^^lt^^, i = I, . . . ,\A\. 

Figure 2 illustrates the pilot lookahead sampling 
operation, with A = {0, 1} and A = 2, in which the 

pilot path for (xj'i^, = 0) is {xt+i = l,xt+2 = 1) 



and the pilot path for {'x\''_}-^,xt = 1) is {xt+ 
xt+2 = 1); both are shown as a dark path. 
The single pilot lookahead algorithm is as follows. 

Single pilot lookahead algorithm. 



0, 



• At time t = 0, for j = 1, 



, m: 



Draw Xq from distribution qQ{xo)- 
Setw\^^ = Mx'i^)h^{xf). 
Generate pilot path yi'i'^ from W^^iQs^^"^ {xg 
Xq"'\xi:s_i) and calculate 

^A{xf,x^l;S) 



aux(j) _ (j) 







^0(2:0 ^)nf=i 9s 1 4^\xi;s-i) 

At times t = 1, 2, . . . .■ 

- ( Optional.) Resample {y-f-i > ""^l-i 1 J = 1; • • • 1 "^1 
With priority scores a^Lx = w^_^ . 

- Propagation: For j = 1, . . . ,m: 

* ( Generating pilots.) For xt = Oi, i = 1, . . . , \ A\, 
draw ^[''l^i-t^^ from (12) and calculate 



n+A (x^i ,xt = ai, xj^'^,^^ 
/(^,_i(xa)Qp)), 



(13) 



where 



t+A 
s=t+l 
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* (Sampling.) Draw from distribution 



qt{xt = tti I xji\ 



t 

iJ) 



2-^k=i 



(i) 

Sample x). is then generated from distribution 

j-Aj,i,k) 



(15) qt{xt 



: Oi X 



(i) ^ 



v^l^l ■sr^A' TT{j,i,k) ' 



* (Updating weights.) We will keep two sets of 
weights. Let 



The corresponding weight and auxiUary weight are 
updated by 



(i) _ „„(i) 



TTtlX 



vr,_i(xSi\),,(.P|xSi\ 



W- 



■ W- 



t-i 



^'\)qt{x\'^ 



X 



(i) ^ 



and 



and 



w 



aux(j) _ (j) 



A:=l 



aux(j) 



1 



a: 



(14) 



Inference: ET^^^^{h{xt)) is estimated by 



1 fc= 



Em aux(j) 



respectively. 

Similar to the conclusion of Proposition 4, samples 
(xp\ if^"^^"''') are properly weighted with respect to 



In addition, we have the following propo- 



In the algorithm we maintain two sets of weights. 
The weight wf is being updated at each step, and 
the sample (xf,ii)p^) is properly weighted with re- 



7rt+A(xi) 
sition: 

(1 i) 

Proposition 5. Suppose sample x^ is gener- 
ated by the exact lookahead sampling algorithm with 
weight w^'^^ as in (7). Denote {yLf'^\w^'^\w^^^^^^) 



spect to 7rt(xt), but not 7rt+A(xj). A second set of as the weighted samples from the k-pilot lookahead 



weights, the auxiliary weight w^^^^''\ is obtained for 
resampling and making inference of E-,^^^^{h{ii.t))- 
We have the following proposition: 

Proposition 4. The weighted sample (xp-*, 
y^aux(jr)^ obtained by the single-pilot lookahead algo- 
rithm is properly weighted with respect to -Kt+Ai^t), 
and estimator (I4) is a consistent estimator of 

The proof is given in the Appendix. 

The pilot scheme can be quite flexible. For exam- 
ple, multiple pilots can be used for each , xt = 
ai). This would be particularly useful when the size 
of the state space A is large. Specifically, for each 



algorithm and 
tal weights, then 



are the cumulative incremen- 



< var 



var. 



0{1/K) 



and 



< var 



w 



K 

=1 k 



XMx,)|x,_i=xSi\)] 



0{l/K). 



(xj'i^, Xt = aj), multiple pilots x^^-^.j_(_^ 
are generated from distribution (12) independently 
and the corresponding cumulative incremental 

weig hts Ul^'"'^'^ are calculated by 



k = l. 



[/P''=)=vrt+A(xa,xt 
/(vr,„i(xSi\)Qi 



where 



Q 



t+A 

n 

s=t+l 



pilot I ^U) 



xr 



The proof is in the Appendix. 

This proposition shows that the variance of the 
weights under the multiple-pilot lookahead sampling 
method is larger than that under the exact looka- 
head sampling method, but converges to the latter 
at the rate of 1/K as the number of pilots K in- 
creases. As a consequence, the samples generated by 
the multiple-pilot lookahead sampling method are 
more effective than the samples generated by the 
lookahead weighting method when pilot number K 
is reasonably large. 

When the state space for xt is continuous, it is 
infeasible to explore all the possible values of xt. 
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Finding a more efficient method to carry out looka- 
head in continuous state space cases is a challenging 
problem currently under investigation. 

One possible approach is the following simple al- 
gorithm. For each j, draw multiple samples of xl , 
i = 1, . . . , A, from qt{xt \ xj'i^-^) and treat this set as 

the space of (the possible values can take). 

Then we run single or multiple pilots from each 

(7) 

of these values and sample xf according to the 
lookahead cumulative incremental weights, just as 
in the discrete state-space case. In the special case 
of j4 = 1 , the sampling distribution of this lookahead 
method will be the same as that in the concurrent 
SMC, but one would use the lookahead weight as 
the resampling priority score at time t. 

An improvement of this approach for the contin- 
uous state-space case can be achieved if the dimen- 
sion of xt is relatively low and when the state-space 
model is Markovian. That is, 



and 



gt{xt I xt-.i) = gtixt I xt-i) 



ftivt I Xt) = ftivt I Xt). 



(hi) 



In this case, the cumulative incremental weight 
of the pilot (xp'*\ x|'^*^'*.j_^^) can be written as 

— 7rt+AlXj_]^,Xj i^t+i-.t+A) 

/(vr,_i(xSi\)Qp)) 
oc5i(xp)|xa)/,(yt|xp)) 

s=t+l 



^T/(i,i)T/(j,i) 



where 



gp) = ,,(xp)|xa) n 

s=t+l 



pilot I ^(J'*^))^ 



and 



T-rt+A pilot/ I 



(i) 

Standard procedure would choose xf from the gen- 



erated X 



t 5 



with probability c4^'*V 



Uf' . However, note that 



V 



(16) 



{3,i) 

t+l:t+A 



A 



g{xt+i I x|^'*V(y(+i I xt+i 
i+A 

• JJ^ gs{xs I Xs„i) 

s=t+2 



f{ys \xs)dxt+i---dxt+A 



only depends on xp'*\ and 



.,+A(xa,Xp'^)) 



7rt-i[:>^t-i)Qt[Xt \^t~i) 
is the lookahead cumulative incremental weight in 
(8), which is shown to be more efficient than [7^^-''*^ = 

Proposition 3. 

— (7 *) 

With a Markovian model, y/^i-t^/^ is the function 
of xp'^^ and V'/_^'^*.''^^^ can be considered as a noisy 
version of Vt+i:t+A{x['^'^^). That is, one can write 



t+l:t+A 



where 



E(e 



0. 



Hence, if the dimension of xt is small, one can smooth 

(7 i) 

'^t+i't+A ™ space of Xt to obtain an estimate 

— (7 i) 

of l^t+i-t+Ai using all the pilot samples. The esti- 
mate is then used for sampling and resampling. For 
example, let V'^+'i t+A ^ nonparametric estimate 



of y^^lt+A and let 
choose Xj''^ from xp'*\ i 



{hi) 



= yt''%+i-,+A- One can 
1, . . . , ^, with probability 

^'*V ^ and weight it accordingly. Experi- 
ence shows that a very accurate smoothing method 
(e.g., kernel smoothing) is not necessary, as to con- 
trol computational cost. Often a piecewise constant 
smoother is sufficient. 

4.5 Deterministic Piloting 

It is also possible to use deterministic pilots in 
the pilot lookahead sampling method. For example, 
at time t, the pilot starting with {y^^}^-^,xt = Oj) for 
each A can be a future path ^tJ^i-tj^A ^^^^ max- 
imizes '/rt+A(xt+i:t+A I 'y^^t-i^^t ~ ^i)- Since such a 
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global maximum is usually difficult to obtain, an 
easily obtainable local maximum is to sequentially, 
for s = t + 1, . . . ,t + A, obtain 



(17) 



arg max vr^ (x^ | x|-^^-^ ,xt = ai,x 



t+l:s~l) 



Once the pilots are drawn, the remaining steps 
are similar to those in the random pilot algorithm, 
except that there is usually no easy way to obtain 
a proper weight with respect to -Kt+Ai^t)-, though a 
proper weight with respect to vrt(xf) is easily avail- 
able. In order to make proper inference with respect 
to 7rj_|„A(xt), one can generate an additional random 
pilot path to Xt+A- Specifically, we have the follow- 
ing scheme. 



Deterministic pilot lookahead algorithm. 

At time t = 0, for j = 1, . . . , m; 

— Draw Xq^ from distribution qQ{xQ). 

— Generate deterministic pilots ^i.'^ sequentially 
by letting 



X 



(i.*) 



/ I (7) (1,*) 
: argmax7rs(Xs | 



for s = l,...,A. Let U^''*^ = 7ta{x'^' A''^')/ 

- Setwl'''^'^=wi'^ul,''*\ 
At times t = 1,2, . . . : 

- (Optional.) Resample {(x|'i\,i(;|'i\),j = 1, . . . , 

m} with priority scores = wl'^[''^ . 

- Propagation: For j = 1, . . . ,m: 

* ( Generating deterministic pilots.) For xt = 

ai, i = 1, . . . ,\A\, obtain "^^tJ^l-ij^^ sequentially 
using (17) for s = t + 1, . . . ,t + A. 

* (Sampling.) Draw from distribution 



qt{xt = ai I x|i\; 



where 



(hi) 



I (j) 

vrt-i(xa) 



i=l 



{hi) 



* (Updating weights.) We keep three sets of 
weights for concurrent weighting, resampling 
and estimation. 



(1) Goncurrent weight: 



,00 - ,„0) 



"'*-'"vr,_i(xSi\)g,(xp)|xa)' 

(2) Resampling weight 

4 = 1 

(3) Auxiliary weight: draw from 



t+A 



s=t+l 

and calculate 



w 



aux(j) _ (j) 



(i) Aj) ^aux(j) 



where 



t+A 



^auxu; _ TT aux/ aux{i) i ij) U) aux(j) s 
— J_J_ Hs l-^s l-X-j-D-^j ,Xt+l:s-lJ- 

s=t+l 

- Inference: Ej^^^^{h{xt)) is estimated by 

m m 

The above algorithm requires the generation of an 
additional random pilot x^"^^^|^^ to obtain w^^^^^\ 
which is properly weighted with respect to 7r(-|_A(x(). 
Alternatively, one can combine the deterministic pi- 
lot scheme and the lookahead weighting method in 
Section 4.1 to obtain a consistent estimate of 

The resampling weight w^-^^^^^ is served as the pri- 
ority score for resampling when needed. It retains 
the information from the deterministic pilot and avoids 
the additional random variation from the additional 
sample path required by the auxiliary weight w'l"^^^^ . 

The deterministic pilots are useful because they 
gather future information to guide the generation 
of the current state Xf. In some cases, the deter- 
ministic pilots can provide a better approximation 
of the distribution 7r(_|_A(xf | x^^^^-i^) than the random 
pilots, especially when we can only afford to use a 
single pilot for each {■}S^}-^,xt = ai). In addition, with 
some proper approximation, the deterministic pilot 
scheme may have lower computational complexity. 
The example in Section 6.1 uses a low complexity 
method to generate the deterministic pilots. 
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Fig. 3. Illustration of multilevel structure m a 16-QAM 
modulation. 

4.6 Multilevel Pilot Lookahead Sampling 

In case of finite state space, when the size of the 
state space A is large, the pilot lookahead sam- 
pling method can still be too expensive. To reduce 
the computational cost, we introduce a multilevel 
method, which constructs a hierarchical structure 
in the state space and utilizes the lookahead idea 
within the structure. Guo, Wang and Chen (2004) 
developed a similar algorithm. 

Specifically, at time t, we first divide the current 
state space A of xt into disjoint subspaces on L + 1 
different levels, that is, 

^ = C/,iUCi,2U---UC/,Dp / = 0,...,L. 

In the division, each level-/ subspace Ci^i consists 
of several level- (Z + 1) sets C;+ij-. On the top level 
(level-0), Co,i = A. On the lowest level (level-L), 
each Cl,! only contains a single state value a, € A. 
For example, in a 16-QAM wireless communication 
problem (Guo, Wang and Chen, 2004), the trans- 
mitted signal Xt to be decoded takes values in space 
A = {ai = (oj,!, ai,2) : ai,i, ai,2 = ±1, ±2}. Figure 3 de- 
picts a multilevel scheme where the state space is 
divided into three levels (L = 2), 

^ = Co,l=Ci,iUCi,2UCi,3UCi,4 



16 



16 



\JC2, = \J{a,]. 



i=l 



(i) 

At time t, instead of sampling xf directly, we 



generate a length L index sequence {/^'^i , . . . 



r(i)l 



in which l9) indicates that xl'" belongs to level-Z 



subsets C (j) . A valid index sequence {I^''^ , . . . , /)"V} 



needs to satisfy C o) C C (j) , Z = 1, . . . , L. The 

'■'-'t,i '■~'-'^t,i~i 

(i) (i) 
last indicator £ specifies the value of the 

level-L subset C (j) only contains one state value. 

The index sequence , . . . , I^''^ } is generated se- 
quentially, starting from the highest level, following 
the trial distribution 

L 

l[qt,i{It,i\^ii^^,It,l-l)- 
1=1 

Here we define Itfl = 1, which coincides with xt G 
Co,i = A. The index sampling distribution qt,iilt,i \ 

/f,i_i) can be constructed as follows, using a 
pilot scheme. 

For every i such that Ci^i cCi_i^/^,_^, randomly 

draw a pilot path (xp'*^ , xJi*^ , . . . , xi'','^! ) from the 



trial distribution 



qf°\xt\x.\^J^,It, 



(18) 



(j) 



t+A 



• n '?f°*(^.ixa,x,, 

s=t+l 

where q\^^"^{xt \ x|'^\,/( ; = i) indicates that Xt must 
be a member of C; j, and calculate 



-,:s — l I 



^pilotO'.i)^ 
't,l ) 



(19) 
where 



QPilot(j,j) 



i+A 



^t-l'-'t,! 



i=i+i 



(7) 

Then sample y is generated from distribution 
(20) 



„ (T _ „• I „(i) r(i) ^ 



tl 



E 



t,l 



Specifically, the algorithm is as follows. 

Multilevel pilot algorithm. 

At time t = 0, for j = 1, . . . , m: 

- Draw Xq^ from distribution qo[xo). 

- 5ei4^) = 7ro(4^'V'7o(4'^). 
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Generate pilot path x^*!^^ from n^i9s"°^( 
Xq"'^ xi:s_i) and calculate 



pilot / 



U, 



pilot / 



Si) 



n , aux(j) (i)rrOj*) 
- Set ^ —WqUq '. 

At time t = 1,2, . . . : 



( Optional.) Resample {'x.[''}-^,w[''}i,j = 1, . . . , m} 

with priority scores a^''}^ = w^"^*'"'^ . 
Propagation: For j = 1, . . . ,m: 



* Sett 



ii) 



1. For level I = 1,2, . . . ,L: 



( Generating pilots.) For each i such that 
Ci^i C C u) , generate pilot {x[ 



X 



t+l:t+A. 



from distribution (18) and 



t,l 



is calculated as in (19). 
■ (Sampling.) Draw from the trial dis- 

tribution (20). 

* (Updating weights.) If chosen at 

last, that is, C (j) = {ajg}, let 



w 



(i) _ 



(i) 



Si) 



W- 



aux(j) 



: W 



(i) 
t-1 



t,L 



nL „ fj-ij) I Aj) (j) ^ 



Inference: E^^^ ^{h{xt)) is estimated by 



w 



aux(j) 



The advantage of the multilevel method is that it 
reduces the total number of probability calculations 
involved in generating xi . For example, generat- 
ing directly from trial distribution qt{xt \ Xj'L\) 
requires a total of |^| evaluations of qt{xt = | 

Xj'i^), i = l,...,\A\. On the other hand, generat- 
ing {I^fl,. . -Jtl) only requires Ef=i "-(4,^-1 ) such 
evaluations, where n{I^^^_^ is the number of level-Z 
subsets contained in level- (/ — 1) subset C (j) . In 

the example illustrated by Figure 3, IK is chosen 
from a set of four subgroups at the first step. Given 
a selected I^ is drawn from a set of four ele- 
ments under I^-^^ . Hence, n{Itfi) = 4 and n{It^i) = 4. 
In this example, a total of 8 probabilities need to be 



evaluated, reduced from 16 if xl were generated 
directly. More generally, if \A\ =4^^, we can reduce 
the computation to 4L evaluations based on such a 
multilevel structure. 

As discussed in Section 4.5, a deterministic pilot 
can also be used in the multilevel method. A mul- 
tilevel pilot lookahead sampling method using de- 
terministic pilots is applied to the signal detection 
example in Section 6.1. 

4.7 Resampling with Lookahead and Piloting 

As discussed in Liu and Chen (1995), Liu and 
Chen (1998), although a resampling step introduces 
additional Monte Carlo variations for estimating the 
current state, it enables the sampler to focus on im- 
portant regions of "future" spaces and can improve 
the effectiveness of samples in future steps. Liu and 
Chen (1998) suggested that one can perform resam- 
pling according to either a deterministic schedule or 
an adaptive schedule. In the following, we consider 
the problem of finding the optimal resampling pri- 
ority score if resampling only takes place at time 
T, 2T, 3T, . . . (i.e., a deterministic schedule). 

Suppose we perform a standard SMC procedure. 
At time t = nT, samples {(xp\ ■wp^), j = 1, . . . ,m} 
properly weighted with respect to 7rt(x() are gener- 
ated, in which xp^ follows the distribution r((x(). 



and w 



U) 



wt(x 



U)) 



7rt(xp^)/rf(x^-^''). We perform 

Si)) 



Si) 



a resampling step with priority score 6(xJ ), then 

the new samples xj'^'^^j = l,...,m, approximately 
follow the distribution V'(xf) that is proportional to 



*{i) 



rt{xi,)b{xt). In the following T steps, x 
is generated sequentially from distribution qs{xs 



<i) 



t+l ) • • • 5 -^t+T 



X 



<i)^ 



t + 1, . . . ,t + T, then the corresponding 

*ii) 



weight of x^_p^ with respect to 7rt_|_r(xf+r) is 
r *(i) \ 

_ ^*+t(x^J) 



*{i) I *ii)) 



(X 



/ * 



*{i)\ 



n(x:^^>,(x:^^^) 

/ *ii) \ 



The following proposition concerns the choice of pri- 
ority score b{xt) that minimizes the variance of weight 

/ *U) \ 
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Proposition 6. The variance of weight 



/ *(i) \ 

(21) 

where 



is minimized when 



vrt(xi)n 

t+T 

s=t+l 



t+T , 



l) dx 



t+i 



■dxt+T- 



The proof is in the Appendix. 

Specifically, if we perform resampling at every step 
(T = 1), and the trial distribution is qs{xs \ x<j_i) = 
-Ksixs I Xs_i), the optimal priority score becomes 



7rt+i(xt) 



vri(xt) 

which is the priority score used in the sequential 
imputation of Kong, Liu and Wong (1994) and Liu 
and Chen (1995), and the auxiliary particle filter 
proposed by Pitt and Shephard (1999). 

When T > 1, the exact value of rjt^Ti^t) in (21) 
is difficult to calculate. In this case, one can use 
the pilot method to find an approximation. For each 

= l,...,K, 



(?) (i i) 

sample x^ , multiple pilots x.Y^i.f-_^_rp 



are generated following distribution HsiT+i^sl^* 



with the cumulative incremental weight 



{hi) 



7rt+T(x 



(i) 



vrt(xF)n 



=i+l 



I Xf ,y<^i^i.s_i) 



Then r?(xp^) can be estimated by R-^ E£i(t^t^''¥- 

4.8 Combined Methods 

The lookahead schemes discussed so far can be 
combined to further improve the efficiency. For ex- 
ample, Wang, Chen and Guo (2002) considered a 
combination of the exact lookahead sampling and 
the pilot lookahead sampling methods. In this ap- 
proach, the space of the immediate future states is 
explored exhaustively, and the space of further fu- 
ture states is explored using pilots. 

5. ADAPTIVE LOOKAHEAD 

Many systems have structures with different local 
complexity. In these systems, it may be beneficial 
to have different lookahead schemes based on local 



information. For example, in one of the wireless com- 
munication applications, the received signal yt can 
be considered as following 

where {vt} is white noise with variance cj^, {xt} is 
the transmitted discrete symbol sequence and {^t} is 
the fading channel coefficient that varies over time. 
Since {^f } varies, the signal-to- noise ratio in the sys- 
tem also changes. When \^t\ is large, the current 
observation contains sufficient information to de- 
code Xt accurately. In this case, lookahead is not 
needed. When is small, the signal-to-noise ra- 
tio is low and lookahead becomes very important to 
bring in future observations to help the estimation 
of and Xt ■ 

Lookahead strategies always result in a better es- 
timator provided that the Monte Carlo sample size 
is sufficiently large so that /(A) in (2) is negligi- 
ble. To control computational cost, however, Monte 
Carlo sample size used may not be large enough to 
make /(A) negligible. For a fixed sample size, /(A) 
can increase as A increases. Hence, it is possible 
that lookahead make the performance worse with fi- 
nite Monte Carlo sample size. The following propo- 
sition provides the condition under which one ad- 
ditional lookahead step in the pilot lookahead sam- 
pling method makes the estimator less accurate. 

Specifically, suppose in a finite state system a sam- 
ple set { (xj'i^ , w'f\ ) , j = 1 , . . . , m} properly weighted 
with respect to 7rt_i(xt_i) is available at time t — 



1. At time t, A-step pilots y^^t+k ~ 



.(i) 
t-i' 



Xt 



'^t+i-t+d>.')'> i = 1; • • • 1 i = 1, . . . ,A, are generated 
from distribution (12) with cumulative incremental 
weight 



t,A 



Then the A-step pilot lookahead sampling estimator 
of /i(xt) is 



-I m A 

1 (i) 
— > w)' 

m ^ 

■ ' i=l 



h = - 2^<\Ef^S^Mxa,x, = a.) 

E{hi^t) I yt+A). 



(j l\ 

If we lookahead one more step and draw x^^^j^-^ 
from trial distribution qt+A+i{xt+A+i \ xj-^*^ yt+A+i) 
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d=1 d=2 



d=3 



I I I 



1 2 3 4 5 



1 2 3 4 5 



1 2 3 4 5 



1 2 3 4 5 



Fig. 4. Illustration of adaptive lookahead criterion. 



the proper (A + l)-step cumulative incremental 
weight is 



/(vr^+AlxSii) 

Then the (A + l)-step pilot lookahead sampling es- 
timator of /i(x() is 



A 



j=i i=i 
E{h{xt) I yt+A+i)- 



Proposition 7. Let 

1, . . . , A} and suppose ^[''^i , j = 1, . . . , m, are i.i.d. 
given yt+A ■ When 



m 



var W- 



A 

t~l ^t,A+l 
i=l 



(22) 



ftiXj_ J I x^^^ ,yt+A 



yt+A 



> 1+ 



m 



Yw[E{h{yLt) I yi+A+i) I yt+A], 



we have 



E[(h* - h{^t)f I Yt+A] > E[{h - h{-^t)f I yt+A]. 
The proof is in the Appendix. 



Condition (22) may be difficult to check in prac- 
tice. However, when p(xj | yt+A) = p(xt | yt+A+i), 
that is, yt+ A+i is independent of the current state xt 
given yt+A) the condition always holds since 
var[£;(/i(xt) | yt+A+i) I yt+A] = 0. 

Proposition 7 suggests that, with a fixed number 
of samples, the performance of the SMC estimator 
can be optimized by choosing a proper lookahead 
step. Here we use a heuristic criteria, depicted in 
Figure 4. Suppose that the state space of xt takes 
four possible values and the distribution 7rt+rf(xt) for 
different lookahead (i = 0,1,2,3 is as shown in Fig- 
ure 4, then we can conclude that the information 
available at t (i.e., yt) is not sufficiently strong for 
making inference on xt, and the samples we gener- 
ate for Xt at this time {d = 0) may not be useful as 
the system propagates. However, as d increases, the 
distribution becomes less diffused, showing the ac- 
cumulation of information about xt from the future 
Vt+d- It Ei-lso shows that further lookahead beyond 
c? = 3 is probably not necessary. The details of this 
adaptive criteria are as follows: 

• In a finite state space model, consider lookahead 
steps A = 0, 1, 2, Stop if A > or the esti- 
mated posterior distribution satisfies 



(23) 



max{7rt+A(a;t = Cj)} 

i 



max 



0) T-T-O'.O 

t-l'-'t,A 



2^l,j^t-l'-^t,A 



>Po, 



where N is the maximum number of lookahead 
steps we will perform, < po < 1 is a threshold 
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(7 i) 

close to 1, and are the cumulative incre- 

mental weights defined in (13). 
In a continuous state space model, try lookahead 
steps A = 0, 1, 2, ... . Stop if A > or the esti- 
mated variance var7rt+^(xt) satisfies 



var. 



(24) 



„,,(i) rr(i.i) 

2^i,j^t-l'-'t,A 

(i) rrCi.i) 



where ctq is a given threshold, are samples 

of current state generated from each xj_]^ under 

the pilot scheme and U^ '^ are the corresponding 
cumulative incremental weights. 

Some examples of using adaptive lookahead in fi- 
nite state space models and continuous state space 
models are presented in Section 6. 

6. APPLICATIONS 

In this section we demonstrate the property of 
lookahead and make performance comparisons. In 
all cases, 5, A and A' are used to denote the num- 
bers of lookahead steps in lookahead weighting, ex- 
act lookahead sampling and pilot lookahead sam- 
pling, respectively. 

6.1 Signal Detection over Flat-Fading Channel 

In a digital wireless communication problem (Chen 
and Liu, 2000; Wang, Chen and Guo, 2002), the re- 
ceived signal sequence {yt} is modeled as 

where {xt} is the transmitted complex digital sym- 
bol sequence, {vt} is the white complex Gaussian 
noise with variance o"^ and independent real and 
complex components, and {^j} is the transmitted 
channel, which can be modeled as an ARMA pro- 
cess 

6 + f^lCi-l H \- 4>rCt-r 

= 6oUt + OiUt^i H h OrUt-r, 

where {ut} is a unit white complex Gaussian noise. 
In this example, we assume {S,t} follows the 



ARMA(3, 3) process (Guo, Wang and Chen, 2004) 

- 2.37409et„i + 1.92936et„2 - 0.53208^4-3 

= 10^2(0.89409ut + 2.68227ut_i 

+ 2.68227ut„2 + 0.89409ut_3). 

This system can be turned into a conditional dy- 
namic linear model (CDLM) as follows: 



zt = Fzt-i+gut, 



yt = itxt + vt = h^ztxt + Vt 



where 



1 





••• -cPr 0\ 












, g = 


• •• 1 0) 







V 

Here we consider a high-constellation system with 
a 256-QAM modulation, thus the symbol space is 
A = {ai = (ai,i,ai,2) :ai,i5«i,2 = ±1, ±3, . . . , ±15}, 
where Oj^i and 0,^2 are the real and imaginary parts 
of symbol a^, respectively. We decode {xt} from re- 
ceived {yt} under the framework of the mixture Kal- 
man filter of Chen and Liu (2000) and the "optimal- 
resampling" scheme of Fearnhead and Clifford (2003). 

Because the symbol space is large (|^| = 256), 
we use a combination of the multilevel pilot looka- 
head sampling method and the lookahead weight- 
ing method. The multilevel structure used is simi- 
lar to that of 16-QAM presented in Figure 3. The 
symbol space is divided into subspaces of five dif- 
ferent levels (-L = 4). Hence, at time t, we generate 

(4,^^4,2'4,3'4,4) to obtain x'f^ for given xji^ se- 
quentially. 

To construct the conditional trial distribution 
qt,i{It,i I ^i-i^ ^u-i)^ generate a deterministic pi- 



lot (x 



) 



for every possible It i given 



[xl_^,I^^ , . . . ,I^Y_i) generated. The steps to gener- 
ate the deterministic pilot are as follows: 



• Predict channel Ct hy C^/^ = E{it \ xji''^, yj_i). Let 



(i) 



xf "'" be the symbol aj € Cij^ ^ closest to yt/Ct 
For s = t + 1, . . . ,t + A', repeat the following: 

- Predict channel by C 
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Choose symbol Ui G A closest to Vs/'^s'''^^''^ as 



Letting Ul^'^'^" = 7rt+A'(xJi\,x 
7rt_i(xj'^\), the trial distribution is 



' )/ 



t+l:t+A' 



Qt,l {It,l 



X 



(i) t(J) 
-I'-'t,/- 



i) 



{jJt,i) 



E 



fc:C;.fcCC 



For comparison, SMC without using the multi- 
level structure and lookahead pilot is also consid- 
ered. More computational details of this problem 
can be found in Wang, Chen and Guo (2002). 

In the simulation, the length of transmitted sym- 
bol sequences is 500. To avoid phase ambiguities, 
differential decoding is used. Specifically, suppose 
the information symbol sequence is {dt}. The actual 
transmitted symbol sequence {xt} is constructed as 
follows: given the 256 QAM transmitted symbol xt^i 
and information symbol dj, we first map them to 
four QPSK symbols (raj^.^,!, r2;j_,,2, -Txt-i.s, 'rxt_i,4) 
and {rdt,i,rdt,2,rdt,3,rdt,i), respectively. Let r^^^.i = 



1,2,3, 4, and we map these four QPSK 



symbols {rxt,i,rxt^2,rxt,2.-,rxt,i) back to 256-QAM as 
the transmitted symbol xt- The differential receiver 

where (a;(_i,Xf) are es- 




20 25 30 
SNR(clB) 



calculates . 

dt,t 

timated {xt-.i,xt) at the receiver, then decodes the 
information symbol dt as the 256-QAM symbol cor- 
responding to ir^^^^,r^^^^, r^^ 3, r^^ ,^). To improve the 
decoding accuracy of this high-constellation system, 
we also insert 10% symbols that are known to the 
receiver into the transmitted symbol sequences peri- 
odically. The experiment is repeated 100 times. A to- 
tal of 50,000 symbols (400,000 bit information) are 
decoded. 

Figure 5 reports the bit-error-ratio (BER) perfor- 
mance of a different lookahead step 6 of the looka- 
head weighting method with standard concurrent 
SMC sampling (A = 0, A' = 0). m = 200 samples are 
used. It is seen that the BER performance does not 
improve further after 6 >8 lookahead steps. We use 

= 10 in the following comparison. 

BER performance of pilot lookahead sampling meth- 
ods with different lookahead steps A' is shown in 
Figure 6. The number of Monte Carlo samples is 
adjusted so that each method takes approximately 
the same CPU time. From the result, it is seen that 
the multilevel pilot lookahead sampling method with 
A' = 1 has smaller BER than SMC without using 



Fig. 5. BER performance of the lookahead weighting method 
with A = 0, A' = 0, m = 200 and different S m a 256-QAM 
system. 

lookahead pilots. But when we use A' = 2, the per- 
formance is worse. One of the reasons is that we 
use the predicted channel to construct the pilot, 
which could be very different from the true chan- 
nel and severely mislead the sampling, especially 
when the number of lookahead steps is large. We 
also implement the adaptive method. Here we use 
adaptive stop criteria (23) with po = 0.90. The re- 
sulting average number of lookahead steps is 0.195. 
Due to the saving in the smaller number of looka- 




8=10,A'=0,m=200 

8=10,A'=1,m=160 
^ S=10,A'=2,m=140 
^ S^10,adpt A'(^0.195),m^180 



10 



15 



20 



25 30 
SNR(dB) 



Fig. 6. BER performance of the multilevel pilot lookahead 
sampling method with A = 0, (5 = 10 but different A' and num- 
ber of samples m m the 256-QAM system. The number of 
samples are chosen so that each of the methods takes approx- 
imately the same CPU time. 
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Table 1 

Average RMSEi for SMC with different lookahead methods. The same numbers of samples (m — 3000^ 
are used in different methods. We use a single pilot lookahead (K = 1) unless stated otherwise. 
Average lookahead steps in the adaptive lookahead method are reports in the parentheses 



A' + S 



RMSEi 





1 


2 


3 


5 


7 


Time (sec.) 


SMC (A' = 0) 


3.128 


1.011 


0.828 


0.817 


0.818 


0.819 


0.113 


SMC (A' = 1,A^W,K ^16) 




1.009 


0.824 


0.813 


0.812 


0.813 


5.952 


SMC (A' = l,yl = 3) 




1.011 


0.831 


0.826 


0.831 


0.839 


0.319 


SMC (A' = 2, A = 3) 






0.838 


0.844 


0.860 


0.876 


0.405 


SMC (A' = 3,yl = 3) 








0.846 


0.885 


0.913 


0.504 


SMC-S (A' = 1,^ = 1) 




1.009 


0.825 


0.815 


0.814 


0.815 


0.170 


SMC-S (A' = 2,^ = 1) 






0.825 


0.815 


0.815 


0.815 


0.197 


SMC-S (A' = 3,^ = 1) 








0.816 


0.816 


0.816 


0.224 


SMC-S (adptA'(0.244),A = 1) 


0.995 


0.834 


0.815 


0.814 


0.816 


0.817 


0.147 


SMC-S (A' = 1,A = 3) 




1.009 


0.824 


0.813 


0.813 


0.813 


0.421 


SMC-S (A' = 2, A = 3) 






0.824 


0.813 


0.813 


0.813 


0.498 


SMC-S (A' = 3, A = 3) 








0.814 


0.814 


0.813 


0.576 



head steps, larger Monte Carlo sample size is used 
with the same computational time. Its BER perfor- 
mance is slightly better than using the fixing pilot 
lookahead step A' = 1. 

6.2 Nonlinear Filtering 

Consider the following nonlinear state space model 
(Gordon, Salmond and Smith, 1993): 

state equation: 

xt = O.Bxt^i + 25xt^i/{l + 

-h8cos(1.2(t - l)) + ut, 

observation equation: yt = xf /20 + Vt., 

where ~ A^(0, cr^), ~ A^(0,r/^) are Gaussian 
white noise. In the simulation, we let a = 1 and 
r] = 1, and the length of observations is T = 100. 
We compare the performance of different lookahead 
strategies. 

In this nonlinear system, TTt{xt \ Xf_i) cannot be 
easily sampled from. Here we use the simple trial 
distribution 



qt{xt I xt_i) =TTt^i{xt I X(_i) =gt{xt \ xt^i) 



and 



pilot / 



qr"^ixs I Xs_i) = gs{xs I Xs-i). 

We use SMC to denote the pilot lookahead sampling 
method for the continuous state space case. The im- 
plementation with smoothing step presented in Sec- 
tion 4.4 is denoted as SMC-S. A simple piecewise 



constant function with interval width 0.5 is used for 
smoothing. Resampling is applied at every step. 

We repeat the experiment 1000 times. The good- 
ness-of-fit measures used are 



RMSEi 



1 ^ 
T 



{xt - XtY 



and 



RMSE2 



. t=i 



t=i 



1/2 



,{xt)y 



1/2 



where RMSE2 is a measurement of estimation vari- 
ance, I {5 + A') in (2). Here £^7^^^^, (xj) is obtained 
by SMC (A' = 0) with a large number of samples 
(m = 200,000) and the lookahead weighting method 
with lookahead steps S* = S + A' . Tables 1 and 2 
report average RMSEi and RMSE2 and the associ- 
ated CPU time of using different sampling methods 
and m = 3000 samples. It can be seen that the de- 
layed methods can greatly reduce RMSEi for small 
5 + A' , but no further improvement can be found 
when 6 + A'>3. SMC with A' = l,A = 10,K = 16 
is an approximation of the exact lookahead sampling 
method with A = 1. It has the smallest RMSEi at 
the cost of extensive computation, which confirms 
Proposition 3. The performance of SMC with a sin- 
gle pilot {K = 1) is poor because the future state 
space cannot be efficiently explored by the small 
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Table 2 

Average RMSE2 for SMC with different lookahead methods. The same numbers of samples (m — 3000^ 

are used in different methods 



A' + S 



RMSE2 





1 


2 


3 


5 


7 


Time (sec.) 


SMC (A' = 0) 


0.137 


0.055 


0.057 


0.066 


0.078 


0.090 


0.113 


SMC (A' = 1,A^10,K ^16) 




0.023 


0.027 


0.032 


0.038 


0.043 


5.952 


SMC (A' = l,yl = 3) 




0.070 


0.105 


0.138 


0.174 


0.203 


0.319 


SMC (A' = 2,^ = 3) 






0.156 


0.220 


0.278 


0.326 


0.405 


SMC (A' = 3, A = 3) 








0.240 


0.356 


0.417 


0.504 


SMC-S (A' = = 1) 




0.043 


0.048 


0.053 


0.062 


0.072 


0.170 


SMC-S (A' = 2,^ = 1) 






0.051 


0.063 


0.066 


0.075 


0.197 


SMC-S (A' = 3,A = 1) 








0.073 


0.081 


0.090 


0.224 


SMC-S (A' = 1,^ = 3) 




0.029 


0.032 


0.036 


0.041 


0.048 


0.421 


SMC-S (A' = 2,^ = 3) 






0.031 


0.039 


0.042 


0.047 


0.498 


SMC-S (A' = 3,^ = 3) 








0.045 


0.050 


0.055 


0.576 



number of pilots. With the smoothing step, SMC- 
S can achieve better performance than the simple 
lookahead weighting method (SMC, A' = 0). SMC- 
S with A = 3 has better performance than SMC-S 
with A = l, because when using A = l, the pilot only 
affects resampling and estimation, but not the sam- 
pling procedure. However, SMC-S with A = 3 also 
takes a longer CPU time. 

We also use the adaptive stop criteria (24) (adpt) 
to choose the lookahead steps adaptively. In the cri- 
teria, we let ctq = 4. The adaptive method has sim- 
ilar performance to the fixed-step pilot lookahead 
sampling method, but much fewer average looka- 
head steps (average lookahead steps are only 0.244) 
and less CPU time. 



For a fair comparison. Tables 3 and 4 report aver- 
age RMSEi and RMSE2 of different methods with 
different numbers of samples, which are chosen so 
that each method used approximately the same CPU 
time. In this table, SMC with A = 1 and the adap- 
tive lookahead scheme has the smallest RMSEi , which 
demonstrates the effectiveness of the adaptive looka- 
head strategy. It also shows that SMC-1 with A' = 
1, A = 10, i^T = 16 has a large RMSEi, because of its 
high computational cost per sample. 

6.3 Target Tracking in Clutter 

Consider the problem of tracking a single target in 
clutter (Avitzour, 1995). In this example, the target 
moves with random acceleration in one dimension. 



Table 3 



Average RMSEi for SMC with different lookahead methods. The 


numbers 


of samples 


are chosen 


so that each method used 


approximately the same CPU time. Average lookahead steps in tht 


I adaptive 


lookahead 


method are reports in 


the parentheses 
















RMSEi 





1 


2 


3 


5 


7 


Time (sec.) 


SMC (m = 3000, A' = 0) 


3.128 


1.011 


0.828 


0.817 


0.818 


0.819 


0.113 


SMC (m = 60, A' = 1, yl = 10, Jf = 16) 




1.079 


0.911 


0.906 


0.912 


0.920 


0.125 


SMC-S (m = 2000, A' = 1, 4 = 1) 




1.010 


0.826 


0.817 


0.817 


0.818 


0.117 


SMC-S (m = 1700, A' = 2,A = 1) 






0.827 


0.818 


0.817 


0.819 


0.116 


SMC-S (m = 1500, A' = 3, yl = 1) 








0.820 


0.822 


0.823 


0.118 


SMC-S (m = 2400, adpt A' (0.245), A = 1) 


0.994 


0.835 


0.816 


0.815 


0.817 


0.818 


0.104 


SMC-S (m = 800, A' = 1, yl = 3) 




1.015 


0.832 


0.821 


0.821 


0.822 


0.108 


SMC-S (m = 700, A' = 2, yl = 3) 






0.827 


0.817 


0.816 


0.817 


0.111 


SMC-S (m = 600, A' = 3, yl = 3) 








0.819 


0.819 


0.820 


0.119 
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Table 4 

Average RMSE2 for SMC with different lookahead methods. The numbers of samples are chosen so that 
each method used approximately the same CPU time 



A' + 5 



RMSE2 







1 


2 


3 


5 


7 


Time (sec.) 


SMC (m = 


3000, A' = 0) 


0.137 


0.055 


0.057 


0.066 


0.078 


0.090 


0.113 


SMC (m = 


60, A' = = lO.if^ 16) 




0.228 


0.254 


0.277 


0.306 


0.334 


0.125 


SMC-S (m 


= 2000, A' = 1,4 = 1) 




0.054 


0.058 


0.064 


0.075 


0.087 


0.117 


SMC-S (m 


= 1700, A' = 2, 4 = 1) 






0.066 


0.083 


0.085 


0.098 


0.116 


SMC-S (m 


= 1500, A' = 3, 4 = 1) 








0.103 


0.114 


0.126 


0.118 


SMC-S (m 


= 800,A' = 1,4 = 3) 




0.062 


0.067 


0.074 


0.084 


0.096 


0.108 


SMC-S (m 


= 700, A' = 2, 4 = 3) 






0.061 


0.078 


0.082 


0.094 


0.111 


SMC-S (m 


= 600, A' =3,4 = 3) 








0.097 


0.109 


0.121 


0.119 



The state equation can be written as 



Xt,l 



1 1 
1 



+ 



1/2 
1 



ut, 



where xt^i and xt^2 denote the one-dimensional lo- 
cation and velocity of the target, respectively; ut ~ 
A^(0,cr^) is the random acceleration. 

At each time t, the target can be observed with 
probability pd independently. If the target is ob- 
served, the observation is 

where vt ~ N{0,r'^). 

In additional to the true observation, there are 
false signals. Observation of false signals follows a 
spatially homogeneous Poisson process with rate A. 
Suppose the observation window is wide and centers 
around the predicted location of the target. Let D 
be the range of the observation window. The actual 
observation yt includes rit detected signals, among 
which at most one is the true observation. There- 
fore, rif follows a Bernoulli(prf) -|- Poisson(A£') dis- 
tribution. 

Define an indicator variable It as follows: 

{0, if the target is not detected at time t, 
k, if the kth signal in yt 
is the true observation, 

then we have 

p{yt,It I xt) 

(l-prf)A, if/t = 0, 
oc { p^(2^r2)-i/2gxp{-(yi,fc -xt)V2r2}, 
iilt = k>0. 



In this system, given It = {h, . . . , It), it becomes a 
linear Gaussian state space model. In such a system, 
the mixture Kalman filter (MKF) can be applied. 

The mixture Kalman filter only generates samples 

(7) 

of the indicators and considers the state space as 

(7) 

discrete. Conditional on and y^, the state vari- 
able Xt is normally distributed. The mean and the 

variance of p{xt-s \ l['^\ yt) can be exactly calculated 
through the Kalman filter. To perform lookahead 
strategies in MKF, suppose we can obtain samples 
{(■'■1+A' W)' i = li • • • 1 ""T-} properly weighted with re- 
spect to -Kt+A{h+A) = p(Jt+A I yt+A), then 



E 



rO') ^ 

h+A) 



E 



is a consistent estimator of Ej^^^^ (xts), 6 = 0,1,.... 
More details of MKF and MKF with lookahead can 
be found in Chen and Liu (2000) and Wang, Chen 
and Quo (2002). 

In this example, we can also use the smoothing 
step presented in Section 4.4 to improve the perfor- 
mance of the pilot lookahead sampling method. It 



— (7 i) 

can be shown that V 



r(i) 



It 



t+l:t+A — ^(^t+l:t+A I H-l^-^t 

i,yt+A) in (16) only depends on the mean /Up'*^ and 
the variance Sp' ^ of the normal distribution p{xt \ 
^i-i^^t = 'i',yt)- For simplicity, we approximately as- 
sume 1^1+1,4+ A only depends on i^'f''^ = {Ht,i\ l^t,2^)^ 
that is. 



t+l:t+A 



V t+l:t+A[fJ-t J+^t ■ 



— (7 *) 

We then use the smoothed Vl+i-t+A ^° reduce the 
variation introduced by random pilots. We denoted 
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Table 5 

MAEi and MAE2 for different lookahead methods. The same numbers of samples (m = 2G0) are used. 
The CPU time used in each experiment is 0.341 seconds for the lookahead weighting method (MKF, A = OJ; 
3.554 seconds for the exact lookahead sampling method (MKF, A = 3); 0.783 seconds for the pilot lookahead 
sampling method (MKF, A' = 3), and 0. 788 seconds for MKF-S (A' = 3) 



A + S/A' + S 










1 


2 


3 


5 


8 


10 


13 


15 


MAEi 


MKF (A = 0) 


1.0300 


0.7890 


0.6560 


0.5830 


0.5180 


0.4750 


0.4590 


0.4470 


0.4450 




MKF (A = 3) 








0.5780 


0.5150 


0.4710 


0.4540 


0.4410 


0.4370 




MKF (A' = 3) 








0.5780 


0.5150 


0.4730 


0.4560 


0.4440 


0.4420 




MKF-S (A' = 3) 








0.5730 


0.5120 


0.4690 


0.4530 


0.4410 


0.4370 


MAE2 


MKF (A = 0) 


0.0932 


0.0760 


0.0618 


0.0525 


0.0460 


0.0453 


0.0467 


0.0500 


0.0520 




MKF (A = 3) 








0.0463 


0.0357 


0.0307 


0.0298 


0.0298 


0.0305 




MKF (A' = 3) 








0.0575 


0.0503 


0.0472 


0.0480 


0.0503 


0.0525 




MKF-S (A' = 3) 








0.0490 


0.0398 


0.0360 


0.0353 


0.0365 


0.0375 



this method by MKF-S. We used the piecewise cons- 

— (i i) 

tant smoother to estimate Vt+i-t+A - smoother, 
the space [min{/i[Y^}, max{/i[Y^}] x [m.in{fi[^2^}, 

max{/i|"'2*''}] is divided into 10 x 10 equal parts. 

In this example, we let = 0.1, = 1.0, pd = 0.8, 
A = 0.1, and D = lOOr. The length of the observa- 
tion period is T = 100. We repeat the experiment 
500 times. The resampling step is applied when the 
effective sample size is less than O.lm. 

Following Avitzour (1995), we use the median ab- 
solute error (MAE) as the performance measure- 
ment. Define 

MAEi = median{|xt_i — xt,i\} 

and 

MAE2 = median{|xt,i - £',rt+A+5(^t,i)l}> 
where xt^i is the consistent estimation of 
-^Tt+A+a using different lookahead methods, and 

^TTt+A+si^t^) is obtained by the lookahead weight- 
ing method using a large number of samples (m = 
20,000). 

We first compare the performance of different look- 
ahead methods using the same number of samples 
(m = 200). Table 5 reports MAEi and MAE2 for the 
lookahead weighting method (MKF, A = 0), the ex- 
act lookahead sampling method (MKF, A = 3) and 
the single pilot lookahead sampling method (MKF, 
A' = 3 and MKF-S, A' = 3). From the result, MAEi 
decreases as the number of lookahead steps increases, 
which shows the effectiveness of the lookahead strate- 
gies. The exact lookahead sampling method (MKF, 
A = 3) has the smallest MAE2, which confirms Pro- 



positions 3 and 5, although its computational cost 
is the highest. We can also see that MKF-S (A' = 3) 
performs better than MKF (A' = 3). 

Then we compare the performance of different 
methods under similar computational cost. The num- 
ber of samples is adjusted so that each method takes 
approximately the same CPU time. Table 6 reports 
the quantiles of absolute estimation errors \xt,i — 
xt^i \ for different lookahead methods with lookahead 
steps A + 6 = 15 (or A' + 5 = 15). The performance 
does not improve further when A' + d > 15. Under 
the same CPU time, the lookahead sampling method 
has the largest absolute estimation error because 
of its high computational cost. The pilot lookahead 
sampling method (MKF-S, A' = 3) has better per- 
formance than the simple lookahead weighting meth- 
od (MKF, A = 0). We then use the stop criteria 
(24) to choose lookahead steps in the pilot lookahead 
sampling method adaptively (MKF-S, adptA'). 
When we set ctq = 1.5r^ in criteria (24), the average 
number of lookahead steps is 1.572. The result shows 
that the adaptive pilot lookahead sampling method 
performs the best under the same CPU time. 

APPENDIX 

Proof of Proposition 1. For any A2 > Ai > 
0, we have 

E^,[E{h{xt)\yt+A,)-h{^t)f 

= E^^[E{h{yLt) |yt-KAi) -E{h{jct) lyi+Aa)]^ 
+ E^^[E{h{^t) \yt+A,) - h{^t)f 
+ 2E^,{[E{hi^t) I yt+A,) - Eihi^t) I yt+A,)] 
•[^(Mxt)|yt+Aj-Mxt)]}. 
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Table 6 

Quantiles of absolute estimation errors \xt,i — xt,i\ for different lookahead methods. The numbers of samples 
are chosen so that each method used approximately the same CPU time. In the adaptive pilot lookahead 
sampling method (MKF-S, adpt A'j,t/ie average number of lookahead steps is 1.572 



Quantiles (A + (5 = 15) 





0.05 


0.25 


0.50 


0.75 


0.95 


Time (sec.) 


MKF (m = 450,A = 0) 


0.0400 


0.2040 


0.4420 


0.7910 


1.7885 


0.791 


MKF (m = 50,A = 3) 


0.0400 


0.2080 


0.4490 


0.8120 


2.1610 


0.851 


MKF-S (m = 200, A' = 3) 


0.0390 


0.2030 


0.4370 


0.7790 


1.6590 


0.788 


MKF-S (m = 280, adptA') 


0.0390 


0.2020 


0.4340 


0.7700 


1.6295 


0.802 



Because 

E^,{[E{h{-^t) I yt+Ai) - E{h{-^t) I yt+Aj] 
■ [E{h{-^t)\yt+ii,)-K^t)]} 

= E{E{[Eihi^t) I yt+A,) - E{h{^t) I yt+A,)] 

• [E{h{yLt) I yt-fAa) - K^t)] I yt-KAa} I Yt} 

= 0, 

the conclusion holds. □ 
Proof of Proposition 2. It is easily seen that 



(2j), / {2J) 



■-E. 



iJ) 

t-1" 



7rt-fA(xJ 



{2,j), 



T^t-l[^t-l)Qt[^t |X 



{2J) I (j) 



t-1) 



•Mxf'^'^)ixa 



: W 



I (i) ^ 

(j) 7rt+A(x^„i. 
vrt-i(<Ji) 



• Ett 



(2 j) X T^t+AyXj I X^„J 



„(2j) I (j) 



X 



X 



(i) 
t-1 



(i) 



(lt{x\ 

i^'^^ E^t+AiK^t) I xt_i =X4_i, 

Then (10) is a direct result of Rao-Blackwellization. 
By replacing /i(xf) with E„^_^_^{h{xt) \ xj_i) and 
/i(xt) = 1 in (10), we obtain (11) and (9), respec- 
tively. □ 

Proof of Proposition 3. Since 

^■^t+Ai^t ri{^t-i^^t J|Xj_i,a:j i 



w 



t~i 



vr.-i(xa)mtf^s(xP^|xS) 



X 



(j) ^(2,J) 



(i) ^(2.i)N„„{i) 



It;. 



vr,+A(xa,xf'^-)) 



t-1 



vrj-i(xSi\)gt(a; 



„(2j) I ^(i) 



X 



t-ii 



|^t+A(xJ{?,+^ |xji\. 



Tt+A 



(i) ^(2,i)^ 



n 



t-l-A 

<i=t-i-i 9s 



{x. 



{3J) I ^(3,i)^ 

I Xs_l J 



(i) ^(2,i) 



we have 



[u;f'^-)/.(xa,x?^-))] 



(2,i),. (i) ^(2,j)^ 



>var^t+AK '"h{i^\%x 
according to the Rao-Blackwellization theorem. □ 



Proof of Proposition 4. To prove (xj' 



(i) 



w 



aux(j)^ 



is properly weighted with respect to dis- 



tribution T^t+Ai^t), we only need to prove 



E^t+Ai'^t 



auxO'), / (i) 



h{-s.\^>)\=E^Ah{-xt)]. 



According to the sampling distribution of the looka- 
head pilot 'x.):J^I^ and calculation of the corresponding 
cumulative incremental weight U^ 



{hi) 



,,aux(j), / 



X 



(i) 1 



E. 



A 



w 



i=l 



■ W 



U) 7rt+A(x[i\,3;t = ai)/i(x[i\,xt = a^) 



i=l 



vrt-i(xSi\) 
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-i?^,^J/i(xi)|xSi\]. 



Because sample (xj'^^ , ) is properly weighted 
with respect to 7rt_i(xt_i), 



E. 



U) 7rf+A(x|i\] 
^ (i) \ 



" 7rt+A(xf-i) 
_ 7rt_i(xt„i) 



the proposition follows. □ 



i?^,^^(/i(xi)|xSi\) 



TTt+A 



Proof of Proposition 5. Because w 



aux{j) 



and 



(iJ) 



both are importance weights, we have 



aux{j)N 



var. 



^-t+A(^f''^) = l- Hence, 



var. 



(w 



aux(j)N2 



aux(i)x2 I ^(j) 



Now we consider the difference between 

i?..-,A[(^r^^'¥ I xSi\] and i?.,,A[(^f "^)^ I x^]. 
Let 



Because 



F rr/O''''*^) I Y^-''^ 1 



7rt+A(x|i\,xj = m) 
vr*-i(xa) 



7rt+A(xji\,Xt = Oi 



vr*-i(xli^i) 



we have ii^7rt+A [^^"''*''^^ I — 0- In addition, e^-?''*'*^), 
i = l,...,^, fc = l,...,i^ are independent condition- 
al on x|{_^-^, and for fixed i, e(-?'*''^), A: = 1, . . . , A', fol- 
low the same distribution: 



A 

E 



7rt+A(x|i\,Xt = ai) 
vrt-i(xa) 



a: 

i=l fc=l 

(i) 



/ 7r,+A(x^i\) 
Vvr,„i(xa) 



X 



(i) 



vt=l fc=l 



X 



(i) 



a:2 

i=l fc=l 

...aK-P'^'^IxS] 



a: 



Then we have 



i=l 



E. 



aux(j)N2 I (j) 



E. 



T^t+A 



[rf'^'¥|xa 



A 



K 



i=l 



hence, 



< var 



(w 



aux(j)N^ 



var. 



TTt + A V^t 



(to 



~0(1/K). 
With a similar method, we can prove 



< var 



Tt+A 



i=l k=l 



var,ri+A[^t ''^^TTt+A^ 



r^o{i/K). n 

Proof of Proposition 6. Let V'i(xt) oc rt(xt) • 
bt{xt) be the distribution of samples after resam- 
pling. Because Et^^^,^{w^^^) = 1, we only consider 
minimizing E'^^^^ (wjl'y)^. We have 



E. 



t+T 



^t(xt) 



• JJ qs{xs\xs^i)dxtdxt+i---dxt+T 

s=t+l 

According to Jensen's inequality, to minimize 
-^TTt+Tl^t+r)^' '^ti'^t) needs to be proportional to 
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Proof^of Proposition 7. For estimator h*, 
because h* and h{xt) are independent conditional 
on yt+A+i, we have 

E[{h*-h{^t)f\yt+A] 

= E{E[[h* - E{h{^t) \yt+A+i) 

+ E{h{yLt) I yt+A+i) - K^t)f I yt+A+i] 1 Yi+A} 
= E[vai(h* I yt+A+i) I Yi+a] 

+ E[[E{h{^t) I yt+A+i) - h{-^t)f I yt+A] 

and for estimator /i, 

£;[(/i-h(xi))'|yt+A] 
= var[/i I yt+A\ 

+ £;[[^(/i(xO I y^+A) - h{^t)f I yt+A]. 
Similar to the proof of Proposition 4, we have 
E[[E{h{i,t)\yt+A) -h{^,)f \ yt+A\ 

- E[[E{h{-^t) I yt+A+i) - K^tf I yt+A] 

= var[^(/i(xt) I yt+A+i) I Yt+A]- 
Hence, 

- h{-^t)f I yt+A] - E[{h - h{xt)f I y^+A] 
= £'[var(K* I yt+A+i) I Jt+A] - var[/i | Ji+a] 
- vaT[E{h{xt) I yt+A+i) I yt+A] 
/ A 



1 



-E 



m 



var w 



i=l 



•/i(xj^?) |yt+A+i 



yt+A 



(25) 



■ var 



m 



w 



i=l 



1 

m 



wai[E{h{yit) \ yt+A+i) | yt+A] 
A 



var 



-l^iE^;i^(-t-i)i3^*+A 



j=i 



■ Yai[E{h{y.t) \ Yt+A+i) \ Yi+a] 
A 



var 



w 



iE^S^Mxl!:V)|y.+A 



i=l 



■ var[£'(/i(xt) | yt+A+i) I yt+A] 



— < var 

m 



-ai;f^^:liMx&?)iy.+A 

i=l 
A 



var 



w 



ii) 



i=l 



1 + — ) var[£'(/i(xt) | yt+A+i) | yi+A]- 



In the pilot lookahead sampling method, because 



ttUA _ TT{j,i)^/Ahi) I , \ 



^'(xl+A I yi+A) 



• gt+A+Ua^i+A+1 I X^+A'^i+A+l 



we have 



^(f^i:liMxi^''oixjix,yt+A) 
=c^i:^Mxp)) 



t+A+1 1 yt+A+1, 



/(p(x(^-'^) 



t+A I y<+Aj 

%+A+i(2;S^*]^_^i I 2;S^*l,yf+A+i))) 



• 9t+A+l(a^l+A+l I ^^t+Lyt+A+l) 
■ p{yt+A+i I yt+A) dx'fj^l^^^ dyt+A+1 



according to the Rao-Blackwellization theorem, 
A 



var 



w. 



1 ^i.A+i^-f-X^.^ j I yi+A 



var 



(26) 



w 



U) 



tVf^A "lx^_;^ j I yt+A 



var w. 



(j) 



i=l 
A 

1=1 



t,A+l 



yi+A 



'iix^_^ J I Xj_^^ ,yt+A 

Combining (25) and (26), the conclusion holds. □ 
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